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Abstract 

This paper presents methods to provide an optimal evaluation of the 
nuclear masses. The techniques used for this purpose come from data as- 
similation (DA) that allows combining, in an optimal and consistent way, 
information coming from experiment and from numerical model. Using 
all the available information, it leads to improve not only masses evalua- 
tions, but also to decrease uncertainties. Each newly evaluated mass value 
is associated with some accuracy that is sensibly reduced with respect to 
the values given in tables, especially in the case of the less well-known 
masses. In this paper, we first introduce a useful tool of DA, the Best 
Linear Unbiased Estimation (BLUE). This BLUE method is applied to 
nuclear mass tables and some results of improvement are shown. Then 
finally, some post validation diagnostics, demonstrating that the method 
has been used in optimal conditions, are described and used to validate 
the results. 

Keywords: Data assimilation, Best Linear Unbiased Estimation, BLUE, 
nuclear masses, mass tables 

1 Introduction 

The mass tables provide an evaluation of the mass for every known and fore- 
casted nuclei that are very important in nuclear physic. Information gathered 
inside those table by experimentalist and various nuclear mass models (for exam- 
ple "Finite-Range Liquid-Drop Model" [HIS] or "Finite- Range Droplet Model" 
[3]) is used for reaction planning and nuclear reactions simulations. Thus, an 
accurate knowledge of masses of the nuclei permits to realize high quality cal- 
culations and planning. 

The purpose of this paper is to present a method to optimally evaluate 
masses of known nuclei, as well as the accuracy associated. The aim is to 
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produce an improved set of data for nuclear masses, with better accuracy respect 
to tabulated ones. This general approach is already applied in other fields 
of science, as for example in meteorology, climatology or oceanography. The 
procedure proposed here is the same as the one climatologists use to obtain 
high accuracy meteorological data. This is the case for example of the widely 
used meteorological re-analysis ERA-40 [3] . 

Despite a frontier often exists between experimental results and theoretical 
ones, both provide pieces of information on the same reality. With an extreme 
point of view, two positions can be assumed. The first one is that only theory is 
valuable and then all experiences are useless. The complete antagonist position 
claims that only experience is meaningful, so there is no need to do simulation 
for forecasting or explaining within a theoretical framework. Obviously, both 
points of view are too restrictive, and ideally both information (theoretical and 
experimental) need to be merged to describe more accurately the physic. Data 
assimilation (DA) is precisely a general method to handle jointly experimen- 
tal data and numerical modelling information to estimate the optimal values. 
Moreover, DA techniques allow at the same time to improve accuracy of the 
estimation with respect to the original data. 

In this paper, we will first develop some aspects on the theory and the basics 
concepts of DA. In fact DA covers a large number of techniques. Here we will 
focus on the Best Linear Unbiased Estimation (BLUE) technique, that fits very 
well to the present problem. We will use this BLUE technique to estimate the 
nuclear masses of the known nuclei found in the classical mass tables. This will 
lead to a new set of nuclear masses with improved accuracy. The evaluation 
we obtain by this procedure could always be claimed to be dominated by some 
assumption. Thus, in the last part of this paper, we will focus on the post 
validation of DA method. That permits to show that the optimal estimation 
obtained by DA is not dominated by intrinsic assumptions. 

2 Data assimilation 

We briefly introduce the theory of DA. However, DA is a wide domain and 
we will not present here the advanced techniques that include dynamics of the 
process, that are for example the basis of the nowadays-meteorological opera- 
tional forecast. This is through advanced DA methods that long-term weather 
forecasting has been drastically improved in the last 30 years. Improvement in 
this field are constant, and nowadays 3-day weather forecasts are as reliable as 
1-day forecasts twenty years ago. Such a procedure use all the recently available 
data, such as satellite measurements, as well as sophisticated numerical models. 
Some interesting information on these approaches can be found in the following 
references [5j[6j[7]. 

The ultimate goal of DA methods is to be able to figure out the inaccessible 
true value of the system state, so called x* with the t index for "true" . The basic 
idea of DA is to put together information coming from an a priori on the state 
of the system (usually called x'', with b for "background"), and information 
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coming from measurements (referenced as y). The result of DA is called the 
analysis x°, and it is an estimation of the true state x* we want to find. 

Some tools are necessary to achieve such a goal. As the mathematical space 
of the background and the one of observations are not necessary the same, 
a bridge between them needs to be built. This is the so called observation 
operator H, with its linearization H, that transforms values from the space of 
the background to the space of observations. The reciprocal operator is the 
adjoint of H , which in the linear case is the transpose H'^ of H. 

Two other ingredients are necessary. The first one is the covariance matrix 
R of observation errors, which are Co = y — -ff (x*) . It can be obtained from the 
known errors on the unbiased measurements. The second one is the covariance 
matrix B of background errors, which are ef, = x** — x* . It represents the error 
on the a priori, assuming it to be unbiased. There are many ways to obtain 
these observation and background error covariance matrices. However, this is 
commonly the output of a model and an evaluation of its accuracy, or the result 
of expert knowledge. 

To find this optimal value x° the underlying idea is to minimise the variance 
of the error ea = x" — x* associated to this value. Then it can be proved that, 
within this formalism, the Best Unbiased Linear Estimator x° is given by the 
following equation: 

x'^ =x'' + K(y-Hx'') (1) 

where K is the gain matrix: 

K = BH^(HBH^ + R)^ (2) 

Moreover we can obtain the analysis error covariance matrix A, characterising 
the analysis errors ta- This matrix can be expressed from K as: 

A = (I - KH)B (3) 

with I the identity matrix. Note that one way to prove equation[2]is to minimise 
the trace of the matrix A, leading also to prove x" is the optimal value we 
are looking for. The demonstration is detailed in the reference [7|. It can 
equivalently be proven throught a maximum likelihood hypothesis. 

It is worth noting that solving equation [T] is equivalent to minimising the 
following function J(x), x° being the optimal solution: 

J(x) = (x-x'')^B-i(x-x'') 

T (4) 
+ (y-Hx)'R-i(y-Hx) 

If we assume that the background x'' is given by a model, and that the 
covariance matrix B comes from an error evaluation of the model, then we 
can make some interesting remarks concerning equation 31 If we recall the 
extreme assumptions on model and experiments mentioned in the introduction, 
we notice that these cases are covered by the minimizing of this function J. If 
we assume that the model is completely wrong then the covariance matrix B is 
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00 (or equivalently is 0), and minimum of J(x) is given by "x" = H~-'^y", 
it corresponds directly to information given only by data. With the opposite 
assumption that data are useless, implying that R is oo, we obtain = x^. 
Such an approach covers the whole range of assumptions we can state with 
respect to data. 

Then the main work is to evaluate as well as possible the observation operator 
H and the two covariance matrices B and R. We will proceed with this task 
within the framework of the mass tables. 

3 Application of data assimilation to the nuclear 
mass evaluation 

To build the Best Linear Unbiased Estimation of the mass tables, we will work 
on masses excess instead of masses themselves. The mass excess of a nucleus is 
the difference between its actual mass and its mass number {A x u) with u the 
atomic mass unit, or "unified atomic mass" (see Table A in [5] for information 
on this unit) and A the number of nucleons. 

For the background x**, we will take the reference data of mass excess values 
proposed in [9l [lO] by P. MoUer, J.R. Nix, W.D. Myers, and W.J. Swiatecki, 
focusing on the masses obtained from the Finite-Range Droplet Model |3] with 
shell energy correction. Note the results are very similar to the ones obtained 
from the Finite-Range Liquid-Drop Model [D [2] ■ In those theoretical tables, 
we will limit ourselves to the known nuclei heavier than oxygen, that is with 
protons number Z > 8, which is the lowest reliable value for the model. 

The experimental mass excess values are the one from G. Audi and all refer- 
ence tables [m [SI [121 [13] ■ This remarkable collection of data also includes the 
error associated to each measurement. This represents the observation, denoted 
y in the previous section. 

These two data series of mass excess values x^ and y are in the same space. 
Then the required observation operator H is obviously reduced to identity I. 

The covariance matrix R on observation errors is chosen to be a diagonal 
matrix. On the diagonal, we put the known value of the uncertainties given in 
experimental tables pTl [8l [T2l [T3] . 

The evaluation of the background errors covariance matrix B is more com- 
plicated. The first assumption is that background errors are independent the 
one from the other. Then B reduces to a diagonal matrix. As no information 
yet exists on the model accuracy, it is also assumed as a second assumption that 
this accuracy is independent of each nuclei we consider. Thus we only need one 
global value of accuracy denoted as cr^. It remains to evaluate this unique value 

on the diagonal. For this purpose, we made a statistical study. The error 
between x** and x* is over estimated by the error between and Hx' — y, 
that is globally more variable. Thus, considering that, last relative error gives 
a penalty to the accuracy on model evaluation. This is the third assumption 
we make to obtain model error evaluation. To evaluate a value for the diagonal 
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Figure 1: Mass excess experimental values percentage of change as a function 
of the neutron and proton numbers of the nuclei 

of B, we calculate the mean square of x*" — y over all the available nuclei. The 
mean square is al = 0.652904, that is ah = 0.808024 MeV. This is the value 
that we will put on the diagonal of B. 

These constructions of B and R matrices are based on some hypothesis that 
are realistic but one can always claim they are not correct or to lousy. To 
comfort them, in section [4l we will present a post validation of the matrix B 
and R that prove they are reliable in the context of this framework and the 
known data. 

For informative purpose, it is interesting to look at the calculated mean value 
of x*" — y, which it is equal to —0.058326 AleV . It is fairly close to 0, and then 
comfort the implicit hypothesis to be in a quasi-unbiased case for background 
error estimation. 

Note that the choice done to build the B and R as diagonal matrix imply 
that only nuclei included in DA procedure are affected by it. 

All the required data are then available to build the DA analysis as described 
in section [21 by simply applying formula [T] The differences between the exper- 
imental results y and the analysed results x*^ obtained with DA are shown on 
Figure [T] in percentage of change of mass excess for all the nuclei in function of 
their protons and neutrons numbers. 

On Figure [TJ it can be noticed that the relative modification of the masses 
could be as lower as 10"^"% up to 80% in absolute value. This wide range 
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of modification can be explain easily within an assimilation process. On one 
hand, in case nuclei masses are known very accurately, like for stable nuclei, 
then information provided by the background (model) do not contribute a lot, 
masses excess to not change, and the relative modification is around 10~^°%. 
On the other hand, if nuclei masses excess are not known very accurately masses 
excess given by the model give a lot more information. Thus, information on 
physical property of nuclei included in the model permits to drive the measured 
value toward a new value that is more likely (in the sense of the maximum 
likelihood principle include in DA method) and then modification can be up to 
80%. Thus, as we notice on Figure [H the more we go far from stability valley 
the more modification of the nuclei masses excess could be important because 
the less the measurement are accurate. 

Considering those notable modification it is worth taking into account the 
new results for mass excess. We have checked that all results are correctly en- 
closed between the background previsions and the experimental values. This 
mean that we never overshoot either experimental value or the one given by 
the model. So the analysed value respects information provided by both. To 
understand better this point let's assume analysed value is not enclosed between 
data and models value. This means that some information was given to the pro- 
cedure to push it towards another unknown value. As there are no other source 
of information, even no correlation between nuclei errors in the present case, 
the analysis values need to be enclosed between experimental value and model 
prediction. Thus whatever is the estimated value, it is somehow consistent with 
the provided information. 

A key point is on the accuracy, as, by construction of the method, DA 
improves it. The diagonal of the matrix A (which, in the present case, is a 
diagonal matrix) contains the variance cr^ of the analysis for each nuclear 
nuclei. We are looking at the percentage of evolution of the accuracy, with 
respect to the experimental accuracy Uy for each nuclei. Thus we can construct 
the following accuracy indicator , observed in percents. 

With such a definition, an improvement of the accuracy (that is a decrease 
of (Tq with respect to (Jy) is a positive percentage. As we are only interested 
in the evolution of this value, we will make a histogram plot where each bin 
represent the accuracy indicator for one nuclei to get a ID representation of all 
the results. To obtain such mono dimensional plot we consider the nuclei ordered 
in the same way as in the reference file [13 . Thus the first bin correspond to the 
first nuclei of the G. Audi and all table, and so on. The results are presented 
on the Figure [21 

From Figure [51 we confirm that all values are positive, which means that 
there is always an improvement of the accuracy. The improvement can be up 
to roughly 70% in some cases. As for the analysed value themselves, this means 
that when an experimental value is known very accurately, a lot of effort is 
required to do better. On the contrary, if original accuracy on data is no so 
good, it is easy to improve it providing only little additional information. 

Considering Figure [H from an experimental point of view, we notice that 
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Figure 2: Percentage of improvement of the experimental error by the analysed 
error 

DA provides value with better accuracy specially on the weak points when the 
experimental value are inaccurate due to the difficulties of measurements. 

Then, globally speaking, we can say DA method applied to nuclear mass 
tables is very successful, and leads within a simple framework to some significant 
improvements of nuclei masses excess and their associated error. 

4 Post validation of the hypothesis 

Establishing the R and B matrices is globally speaking a difficult task. This 
is already known for long time in the meteorology, oceanography and other do- 
mains. The construction of those matrices are fundamental for the DA process. 
Thus, checking methods have been developed to make post validation of those 
important components. The details of those methods and some applications 
are presented in references [Ml [15] . We introduce them here to discuss their 
application and results. 

We define the difference between measurements and observations by: 

d = y - Hx" (5) 

It can be proved that: 

lE(dd^) = R + HBH^ (6) 
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where IE is the mathematical mean. Then, from that formula, two other come 
rather directly: 

]E((I - HK)dd^) = R (7) 

and: 

]E(Kdd^) = BH^ (8) 

From those two formulas [7| and [51 we can check and validate the hypothesis we 
did. First, it is assumed that the best R matrix is proportional to the originally 
established one that we will call (Ro). Because the diagonal of the matrix 
contains variances, we need a positive coefficient, chosen to be a square term 
Sq. Thus, we have: 

R = s^Ro (9) 

To obtain that coefficient, using traces of both matrices leads (see [T31 [H] 
for details) to the following result: 

We can do a similar process with background error matrix B, assuming that: 

B = slBo (11) 

In this case, the situation is different, as we do not got direct equation on B 
but only on BH"^. Thus, we will work on the HBH"'" matrix assuming, equally 
to the previous equation, that: 

Tmce(HBH^) = sgrrace(HBoB^) (12) 

According to equation [5] we have: 

Trace(HBH^) = s^Trace(IE(HKdd^)) (13) 

The left side of the equation can be rewritten as: 

Trace(IE(HKdd^)) = lE{d^H6a) (14) 

with (5a = x° — x^. Then we can express s^: 

_ ]E(d^ma) 

" TraceiUBoUT) 

From equation [51 and [TTl we notice that, if and are equal to 1, estima- 
tions of B and R are perfectly balanced with respect to BLUE equations. This 
method can be applied iteratively to improve the quality of the matrix until it 
converges to a steady state with both values of and si close to 1. 

Let's first evaluate the quality of the modelling on the first step. The results 
are the following Sf, = 0.97445 and Sq = 1.3007. If we go on in the iterations, a 
satifactory converged state is reached at the 6*'' one. Looking only at the first 
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step results, we can already claim that modelling of B matrix is very satisfactory. 
For the case of the R matrix the result is not so good, but still very correct in 
an assimilation procedure. It is worth recalling that the values Sb and So are 
linked together, so modification on R or on B impacts both of them. Globally 
speaking, results of the post processing are satisfactory and do not enlighten 
a discrepancy of several order of magnitude as it can happen in such process. 
Especially, we are deeply comforted about the rightfulness of the most delicate 
hypothesis done on the B matrix. 

5 Conclusion 

DA technique applied on the mass tables appears to be very successful. The 
new set of mass data produced is reliable: 

• it shows to be within the limit previously given by theory and experience, 

• the accuracies on the mass excess are lower than the one previously known, 
which makes them more suitable to use, 

• the post validation is satisfactory and validate the hypothesis done to 
build data set. 

Thus here is described the generation of an optimal set of masses that can 
be used when needing mass tables information, as it was done in climatology 
by ERA-40 re-analysis 0] . 

However, as a perspective, the present application is showing only some 
limited aspects of the possibility of DA. Especially, the improvement of the B 
matrix can be studied in order to open the way for forecasting more accurately 
masses of yet unknown nuclei. This is still a tiny improvement among the huge 
number of possibilities that DA can provide to nuclear physic field. 
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